Search CORE

117 research outputs found

Text detection and recognition in images and video sequences

Author: Chen Datong
Publication venue
Publication date: 10/03/2006
Field of study

Text characters embedded in images and video sequences represents a rich source of information for content-based indexing and retrieval applications. However, these text characters are difficult to be detected and recognized due to their various sizes, grayscale values and complex backgrounds. This thesis investigates methods for building an efficient application system for detecting and recognizing text of any grayscale values embedded in images and video sequences. Both empirical image processing methods and statistical machine learning and modeling approaches are studied in two sub-problems: text detection and text recognition. Applying machine learning methods for text detection encounters difficulties due to character size, grayscale variations and heavy computation cost. To overcome these problems, we propose a two-step localization/verification approach. The first step aims at quickly localizing candidate text lines, enabling the normalization of characters into a unique size. In the verification step, a trained support vector machine or multi-layer perceptrons is applied on background independent features to remove the false alarms. Text recognition, even from the detected text lines, remains a challenging problem due to the variety of fonts, colors, the presence of complex backgrounds and the short length of the text strings. Two schemes are investigated addressing the text recognition problem: bi-modal enhancement scheme and multi-modal segmentation scheme. In the bi-modal scheme, we propose a set of filters to enhance the contrast of black and white characters and produce a better binarization before recognition. For more general cases, the text recognition is addressed by a text segmentation step followed by a traditional optical character recognition (OCR) algorithm within a multi-hypotheses framework. In the segmentation step, we model the distribution of grayscale values of pixels using a Gaussian mixture model or a Markov Random Field. The resulting multiple segmentation hypotheses are post-processed by a connected component analysis and a grayscale consistency constraint algorithm. Finally, they are processed by an OCR software. A selection algorithm based on language modeling and OCR statistics chooses the text result from all the produced text strings. Additionally, methods for using temporal information of video text are investigated. A Monte Carlo video text segmentation method is proposed for adapting the segmentation parameters along temporal text frames. Furthermore, a ROVER (Recognizer Output Voting Error Reduction) algorithm is studied for improving the final recognition text string by voting the characters through temporal frames

Infoscience - École polytechnique fédérale de Lausanne

Multiple Hypotheses Video OCR

Author: Chen Datong
Luettin Juergen
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

In this paper, we present a method to improve video OCR with multiple character hypotheses. The text regions in video need to be binarized before work as the input of current OCR system. Tranditional binarization do not use any structural information about the text. Based on a certain statistic model, we define a binarization method, which is called observation function, that should satisfy a certain condition. We then present a method to construct an observation function by computing binarization results according to multiple hypotheses of characters obtained by an OCR system.

Infoscience - École polytechnique fédérale de Lausanne

A Survey of Text Detection and Recognition in Images and Videos

Author: Chen Datong
Luettin Juergen
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

A Survey of Text Detection and Recognition in Images and Videos, including the state-of-the-art methods and systems

Infoscience - École polytechnique fédérale de Lausanne

Video OCR for Sport Video Annotation and Retrieval

Author: Bourlard Hervé
Chen Datong
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

This paper presents a video OCR system that automatically extracts closed captions from video frames as keywords (or as we called "cues") for building annotations of sport videos. In this system, text regions that contain closed captions are first identified using support vector machines (SVMs). We then enhance the identified text regions by using two groups of asymmetric filters and recognize them using commercial OCR software package. The resulting captions are recorded as cues in XML format for video annotation and retrieval task

Infoscience - École polytechnique fédérale de Lausanne

ASYMMETRIC FILTER FOR TEXT RECOGNITION IN VIDEO

Author: Chen Datong
Shearer Kim
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Stripes are a common sub-structure of text characters, and the scale of the stripes does not vary significantly within a character. In this paper a new form of filter is derived from the Gabor filter which can efficiently estimate the scales of such stripes. The contrast of text in video can then be increased by enhancing the edges of those stripes found to have a suitable scale. The algorithm presented enhances the stripes in three selected scale ranges. Character recognition is then performed on the output of binarizing these enhanced images, and shows improvement over other methods

Infoscience - École polytechnique fédérale de Lausanne

Video Text Segmentation Using Particle Filters

Author: Chen Datong
Odobez Jean-Marc
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

This paper presents a probabilistic algorithm for segmenting and recognizing text embedded in video sequences based on adaptive thresholding using a Bayes filtering method. The algorithm approximates the posterior distribution of segmentation thresholds of video text by a set of weighted samples. The set of samples is initialized by applying a classical segmentation algorithm on the first video frame and further refined by random sampling under a temporal Bayesian framework. This framework allows us to evaluate an text image segmentor on the basis of recognition result instead of visual segmentation result, which is directly relevant to our character recognition task. Results on a database of 6944 images demonstrate the validity of the algorithm

Infoscience - École polytechnique fédérale de Lausanne

A New Method of Contrast Normalization for Verification of Extracted Video Text Having Complex Backgrounds

Author: Chen Datong
Odobez Jean-Marc
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

One of the difficulties of extracting text contained in images or videos comes from the variation of the grayscale values of the text and backgrounds. In this paper we propose a new method to normalize the contrast between text characters and backgrounds so that a trained machine learning tool can verify characters of grayscale values that have never been seen before. Experiments show that the proposed method used in training either a multilayer perceptrons or a support vector machine yields better text verification comparing with other typical contrast measures

Infoscience - École polytechnique fédérale de Lausanne

Sequential Monte Carlo Video Text Segmentation

Author: Chen Datong
Odobez Jean-Marc
Publication venue
Publication date: 10/03/2006
Field of study

This paper presents a probabilistic algorithm for segmenting text embedded in video based on Monte Carlo sampling. The algorithm approximates the posterior of segmentation thresholds of video text by a set of weighted samples, referred to as particles. The set of samples is initialized by applying a traditional segmentation algorithm on the first video frame and further refined by random sampling under a temporal Bayesian framework. Results on a database of 6944 images demonstrated the validity of the algorithm

Infoscience - École polytechnique fédérale de Lausanne

Video Text Recognition Based on Markov Random Field and Grayscale Consistency Constraint

Author: Chen Datong
Odobez Jean-Marc
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

A method for segmenting and recognizing text embedded in video and images is proposed in this paper. In the method, multiple segmentation hypotheses of text image are first generated based on a MRF model. Background regions in each hypothesis are then removed by using grayscale consistency constraint (GCC) in a connected component analysis procudure before being processed by an optical character recognition (OCR) software

Infoscience - École polytechnique fédérale de Lausanne

A Localization/Verification Scheme for Finding Text in Images and Video Frames Based on Contrast Independent Features and Machine Learning Methods

Author: Chen Datong
Odobez Jean-Marc
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

Automatic character detection in video sequences is a complex task, due to the variety of sizes and colors as well as to the complexity of the background. In this paper we address this problem by proposing a localization/verification scheme. Candidate text regions are first localized by using a fast algorithm with a very low rejection rate, which enables the character size normalization. Contrast independent features are then proposed for training machine learning tools in order to verify the text regions. Two kinds of machine learning tools, multilayer perceptrons and support vector machines, are compared based on four different features in the verification task. This scheme provides fast text detection in images and videos with a low computation cost, comparing with traditional methods

Infoscience - École polytechnique fédérale de Lausanne